-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-elasticsearch): Enable case senstivity support in the Elasticsearch connector #26352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideEnable configurable case-sensitive name matching in the Elasticsearch connector by introducing a new flag in configuration, updating metadata resolution to normalize identifiers based on this flag, and adding supporting tests and test harness adjustments. Sequence diagram for identifier normalization in metadata resolutionsequenceDiagram
participant "ConnectorSession"
participant "ElasticsearchMetadata"
participant "ElasticsearchConfig"
"ConnectorSession"->>"ElasticsearchMetadata": Request table/column metadata
"ElasticsearchMetadata"->>"ElasticsearchConfig": Check caseSensitiveNameMatching
"ElasticsearchMetadata"->>"ElasticsearchMetadata": normalizeIdentifier(session, identifier)
"ElasticsearchMetadata"-->>"ConnectorSession": Return normalized metadata
Class diagram for updated ElasticsearchConfig and ElasticsearchMetadataclassDiagram
class ElasticsearchConfig {
- boolean ignorePublishAddress
- boolean verifyHostnames
- Security security
+ boolean caseSensitiveNameMatching
+ boolean isCaseSensitiveNameMatching()
+ ElasticsearchConfig setCaseSensitiveNameMatching(boolean)
}
class ElasticsearchMetadata {
- ElasticsearchClient client
- String schemaName
- Type ipAddressType
+ boolean caseSensitiveNameMatching
+ String normalizeIdentifier(ConnectorSession, String)
}
ElasticsearchMetadata --> ElasticsearchConfig: uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Please include documentation for this connector configuration property. Two doc examples can be found in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/cassandra.rst#configuration-properties and https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/bigquery.rst#configuration-properties. |
…asticsearch connector
e62b68f
to
d6bcf3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Change @config("case-sensitive-name-matching") to @config("elasticsearch.case-sensitive-name-matching") so the new setting uses the same prefix as other Elasticsearch connector properties.
- In ElasticsearchQueryRunner you build the config map then immediately create a new variable
newconfig
for the same map; consider building and passing the map in one step to avoid the redundant local.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Change @Config("case-sensitive-name-matching") to @Config("elasticsearch.case-sensitive-name-matching") so the new setting uses the same prefix as other Elasticsearch connector properties.
- In ElasticsearchQueryRunner you build the config map then immediately create a new variable `newconfig` for the same map; consider building and passing the map in one step to avoid the redundant local.
## Individual Comments
### Comment 1
<location> `presto-elasticsearch/src/main/java/com/facebook/presto/elasticsearch/ElasticsearchConfig.java:59` </location>
<code_context>
private boolean ignorePublishAddress;
private boolean verifyHostnames = true;
private Security security;
+ private boolean caseSensitiveNameMatching;
@NotNull
</code_context>
<issue_to_address>
**suggestion:** Default value for caseSensitiveNameMatching is not set.
Initialize caseSensitiveNameMatching to false to clarify its default behavior and prevent ambiguity.
```suggestion
private boolean caseSensitiveNameMatching = false;
```
</issue_to_address>
### Comment 2
<location> `presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch/TestElasticsearchMixedCaseTest.java:41` </location>
<code_context>
+ .setCaseSensitiveNameMatching(false));
}
@Test
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding negative tests for unsupported index name cases.
Add a test that uses uppercase or mixed-case index names to verify the connector returns a clear error or fails gracefully, ensuring compliance with Elasticsearch's lowercase index name requirement.
```suggestion
@Test
public void testUppercaseIndexNameFails() throws Exception
{
String invalidIndexName = "TestIndex";
try {
// Attempt to create an index with uppercase letters
client.indices().create(new org.elasticsearch.client.indices.CreateIndexRequest(invalidIndexName), DEFAULT);
// If no exception, fail the test
assertTrue(false, "Expected exception for uppercase index name");
}
catch (org.elasticsearch.ElasticsearchStatusException e) {
// Elasticsearch should reject uppercase index names
assertTrue(e.getMessage().contains("Invalid index name"), "Error message should mention invalid index name");
}
catch (Exception e) {
// Other exceptions may also be thrown, but should indicate invalid index name
assertTrue(e.getMessage().toLowerCase().contains("invalid") || e.getMessage().toLowerCase().contains("uppercase"), "Error message should mention invalid index name or uppercase");
}
}
@Test
```
</issue_to_address>
### Comment 3
<location> `presto-docs/src/main/sphinx/connector/elasticsearch.rst:55` </location>
<code_context>
``elasticsearch.max-http-connections`` Maximum number of persistent HTTP connections to Elasticsearch.
``elasticsearch.http-thread-count`` Number of threads handling HTTP connections to Elasticsearch.
``elasticsearch.ignore-publish-address`` Whether to ignore the published address and use the configured address.
+``case-sensitive-name-matching`` Enable case sensitive identifier support for schema and column names for the connector.
+ When disabled, names are matched case-insensitively using lowercase normalization.
+ Default is ``false``.
</code_context>
<issue_to_address>
**nitpick (typo):** Consider hyphenating 'case sensitive' to 'case-sensitive' for grammatical correctness.
As a compound adjective preceding 'identifier support', 'case-sensitive' is the correct form.
```suggestion
``case-sensitive-name-matching`` Enable case-sensitive identifier support for schema and column names for the connector.
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
private boolean ignorePublishAddress; | ||
private boolean verifyHostnames = true; | ||
private Security security; | ||
private boolean caseSensitiveNameMatching; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Default value for caseSensitiveNameMatching is not set.
Initialize caseSensitiveNameMatching to false to clarify its default behavior and prevent ambiguity.
private boolean caseSensitiveNameMatching; | |
private boolean caseSensitiveNameMatching = false; |
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build.
Applied sourcery-ai suggestion to hyphenate case-sensitive
, found no other concerns and am approving the doc. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm
Description
Elasticsearch only supports Mixedcase support for column names and schema names.
Index Names in Elasticsearch is mapped as Table names for Elasticsearch connector in Presto. As per the official documentation , the index name should be in lowercase only , therefore , Uppercase and Mixed case will not be supported.
https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create#operation-indices-create-path
Motivation and Context
Impact
Test Plan
Test cases have been introduced and are passing successfully.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.